Tuning an Existing Nomenclature for Specific Domain Corpora: A Syntax-Based Similarity Method

نویسندگان

  • Pierre Zweigenbaum
  • Benoit Habert
  • Adeline Nazarenko
  • Jacques Bouaud
چکیده

There is a constant need to extend and tune medical vocabularies to account for new words and new word usages. Robust natural language processing (NLP) tools can be applied to medical texts corpora such as patient narratives and help collect and analyze unknown words1,2. The aim of the present work is to assess the potential for classifying unknown words based on the semantic categories of “neighbors” identified through syntactic distributional properties3.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

In this paper, we propose a novel domain adaptation method named “mixed fine tuning” for neural machine translation (NMT). We combine two existing approaches namely fine tuning and multi domain NMT. We first train an NMT model on an out-of-domain parallel corpus, and then fine tune it on a parallel corpus which is a mix of the in-domain and out-ofdomain corpora. All corpora are augmented with a...

متن کامل

Measures of semantic similarity and relatedness in the biomedical domain

Measures of semantic similarity between concepts are widely used in Natural Language Processing. In this article, we show how six existing domain-independent measures can be adapted to the biomedical domain. These measures were originally based on WordNet, an English lexical database of concepts and relations. In this research, we adapt these measures to the SNOMED-CT ontology of medical concep...

متن کامل

Exploitation of semantic similarity for adaptation of existing terminologies within biomedical area

We present a novel method for adaptation of existing terminologies. Within biomedical domain and when no textual corpora for building terminologies are available, we exploit UMLS metathesaurus which merges over a hundred existing biomedical terminologies and ontologies. We exploit also algorithms for measuring the semantic similarity in order to limit, within UMLS, a semantically homogeneous sp...

متن کامل

NLPCC 2016 Shared Task Chinese Words Similarity Measure via Ensemble Learning Based on Multiple Resources

Many Chinese words similarity measure algorithms have been introduced since it’s a fundamental issue in various tasks of natural language processing. Previous work focused mainly on using existing semantic knowledge bases or large-scale corpora. However, knowledge base and corpus have limitations for broad coverage and data update. Thus, ensemble learning is then used to improve performance by ...

متن کامل

Web-Based Semantic Similarity: An Evaluation in the Biomedical Domain

Computation of semantic similarity between concepts is a very common problem in many language related tasks and knowledge domains. In the biomedical field, several approaches have been developed to deal with this issue by exploiting the structured knowledge available in domain ontologies (such as SNOMED-CT or MeSH) and specific, closed and reliable corpora (such as clinical data). However, in r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998